Telling English Tweets Apart: the Case of US, GB, AU

نویسندگان

  • Asmelash Teka Hadgu
  • Netaya Lotze
  • Robert Jäschke
چکیده

In this paper, we study how to automatically tell different varieties of English apart on Twitter by taking samples from American (US), British (GB) and Australian (AU) English. We track cities and apply filters to generate ground-truth data. We perform expert evaluation to get a sense of the difficulty of the task. We then cast the problem as a classification task: given a tweet (or a set of tweets from a user) in English, the goal is to automatically identify whether the tweet (or set of tweets) is US, GB or AU English. We perform experiments to compare some linguistic features against simple statistical features and show that character Ngrams are quite effective for the task. Our work is closely related to socio-linguistics, especially research on diatopic varieties, linguistic landscapes, and World Englishes [5].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Influence of Affective Variables on the Complexity, Accuracy, and Fluency in L2 Oral Production: The Contribution of Task Repetition

The main purpose of the study reported in this paper was to examine the interrelationships between L2 risk-taking, English learning motivation, L2 speaking anxiety, linguistic confidence, and low-proficiency English as a foreign language (EFL) learners’ speaking complexity, accuracy, and fluency (CAF). A secondary purpose was to test whether task repetition can influence the level of the mentio...

متن کامل

Telling Apart Tweets Associated with Controversial versus Non-Controversial Topics

In this paper, we evaluate the predictability of tweets associated with controversial versus non-controversial topics. As a first step, we crowd-sourced the scoring of a predefined set of topics on a Likert scale from non-controversial to controversial. Our feature set entails and goes beyond sentiment features, e.g., by leveraging empathic language and other features that have been previously ...

متن کامل

2016 Olympic Games on Twitter: Sentiment Analysis of Sports Fans Tweets using Big Data Framework

Big data analytics is one of the most important subjects in computer science. Today, due to the increasing expansion of Web technology, a large amount of data is available to researchers. Extracting information from these data is one of the requirements for many organizations and business centers. In recent years, the massive amount of Twitter's social networking data has become a platform for ...

متن کامل

A one-dimensional model for variations of longitudinal wave velocity under different thermal conditions

Ultrasonic testing is a versatile and important nondestructive testing method. In many industrial applications, ultrasonic testing is carried out at relatively high temperatures. Since the ultrasonic w...

متن کامل

Ethicality of Narrative Inquiry as a Tool of Knowledge Production in Research

The sociocultural ways of conceptualizing human learning in general education have given rise to various sensitive and time-consuming tools of knowledge production for both the researcher and the researched. The gravity of the situation is more noticeable in narrative inquiry methodology, which has gathered momentum in both general education and second language teacher education (SLTE) because ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016